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ABSTRACT 

The Herschel Space Observatory of ESA was launched in May 2009 and is in operation since. From its distant orbit around L2 it 
needs to transmit a huge quantity of information through a very limited bandwidth. This is especially true for the PACS imaging 
camera which needs to compress its data far more than what can be achieved with lossless compression. This is currently solved 
by including lossy averaging and rounding steps on board. Recently, a new theory called compressed-sensing emerged from the 
statistics community. This theory makes use of the sparsity of natural (or astrophysical) images to optimize the acquisition scheme 
of the data needed to estimate those images. Thus, it can lead to high compression factors. 

A previous article by Bobin et al. (2008) showed how the new theory could be applied to simulated Herschel/PACS data to solve 
the compression requirement of the instrument. In this article, we show that compressed-sensing theory can indeed be successfully 
applied to actual Herschel/PACS data and give significant improvements over the standard pipeline. In order to fully use the 
redundancy present in the data, we perform full sky map estimation and decompression at the same time, which cannot be done 
in most other compression methods. We also demonstrate that the various artifacts affecting the data (pink noise, glitches, whose 
behavior is a priori not well compatible with compressed-sensing) can be handled as well in this new framework. Finally, we make 
a comparison between the methods from the compressed-sensing scheme and data acquired with the standard compression scheme. 
We discuss improvements that can be made on ground for the creation of sky maps from the data. 

Key words, nstrumentation: photometers, Methods: numerical, Methods: statistical, Techniques: image processing, Galaxies: general 



1. Introduction 



The Herschel Space Observatory (jPilbratt et all 120 ldT) is a 
three and a half year mission carrying instruments to ob- 
serve in the far-infrared. It is dedicated to the investigation 
of galaxy formation and evolution mechanisms, star forma- 
tion and interaction with the interstellar medium, molecu- 
lar chemistry in the universe and finally chemical analysis 
of the atmospheres of bodies of the Solar System. With its 
3.5 m telescope, Herschel is the largest space observatory 
to date. Its three instruments are: t he Photodetecto r Arra y 
Camera and Spectrometer (PACS) (Poglits ch et all 12010). 
the Spectral and Ph otometric Imaging Receiver (SPIRE), 
(jGriffin et all 120101) . an d the Heterodyne Instr ument for 
the Far Infrared (HIFI) (|de Graauw et all l2010l) . 

Due to power and dissipation constraints, routine op- 
erations on Herschel see only one instrument powered 
per observing day (OD). A special mode exists, called 
the SPIRE+PACS parallel mode (see below), where both 
SPIRE and PACS can be operated. The duty-cycle of 
Herschel rests on the concept of the OD, which typically 
consists in a period of autonomous observations of 20-22 h, 
and a daily telecommunication period (DTCP) of 2-4 h dur- 
ing which the data recorded on board is downlinked and 
the program of the next ODs is uplinked (Herschel always 
stores on-board the program of its next two ODs, in case 
of problems during the DTCP). 

Limits to the amount of data that can be generated by 
the instruments are imposed by (1) the speed of the in- 
ternal communication links between the instruments and 



the service module, and (2) the speed and duration of the 
communication link between the satellite and its ground 
stations. These limits translates typically in an accept- 
able average data rate for the instrument during the OD 
of 130kbit/s, while the PACS photometer, featuring the 
largest number of pixels (^2500) and a high sampling fre- 
quency (40 Hz) provides a native data rate of 1600kbit/s, 
to which 40 kbit /s have to be added for housekeeping infor- 
mation. The situation is even more complex for the PACS 
spectrometer at 4000kbit/s (but the integration ramps can 
be compressed on-board in a natural fashion), while SPIRE 
and HIFI have much smaller data rates around or below 
lOOkbit/s, requiring no compression on board. 

We focus on the PACS imaging camera, which is made 
of two arrays of bolometers operating in the 55 to 210 fim 
range. Due to the above restrictions PACS data need to 
be compressed by a factor of 16 to 32. For this purpose, 
on-board data reduc tion and compression is carried out by 
dedicated sub-units (jOttensamer fc KerschbaumL l2008h . 

In this paper, we investigate alternative compression 
modes to attain such a high compression factor without sac- 
rificing noise or resolution. The study in this paper is based 
on the theory of compressed-sensing and repre sents a real- 
world implementation of ideas presented in iBobin et al.l 
(|2008l h Simply put, compressed-sensing is a promising the- 
ory which leverages the sparsity properties of dat a to allow 
measurements beyon d the Shannon-Nyquist limit (|DonohoL 
120061 : ICanded . 120081) . In the context of PACS, it offers po- 
tential to optimize on-board compression. It has the very 
strong advantage that it is not computationally expensive 
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on the acquisition side, a very important point for a space 
mission. An additional constraint for PACS is that CS had 
to be implemented after the conception of the instrument, 
while the instrument hardware cannot be changed as well 
as most of the on-board software. 

We first present the PACS instrument, describe the 
properties of its data, and explain how we model the instru- 
ment. Then, we explain how we perform map-making using 
PACS data which is a central stage in data analysis. Finally, 
we introduce our compressed-sensing implementation and 
we show how the map-making process can be modified in 
the compressed-sensing framework to estimate state-of-the- 
art or better sky maps. 

2. PACS data 

2.1. The PACS imaging camera 

The PACS imaging camera is made of two arrays of bolome- 
ters. One array is made of two 16 x 16 matrices forming a 
32 x 16 array. It is associated with the "red" filter rang- 
ing from 130 to 210 fim. The other array is made of eight 
16 x 16 matrices and thus has a shape of 64 x 32 pixels. It is 
associated with the filters "green" and "blue" ranging re- 
spectively from 85 to 130 /im and from 60 to 85 /im. Figure 
([3]) shows a single frame of the PACS blue array. 

The PACS camera can be used in two modes: the chop- 
nod mode and the scan mode. The chop-nod mode consists 
in using an internal chopper to alternate between two sky 
positions (ON and OFF) and the satellite to exchange the 
ON and OFF beam positions. It was originally dedicated to 
the observation of point sources, with the chop-nod modu- 
lation used to get rid of the low-frequency noise originating 
in the detector. However it turned out that even for point 
sources, scan-mode observations are more efficient in terms 
of achieved sensitivity for a given observing time. Therefore 
this mode is not considered here. The scan mode scans the 
sky in order to observe larger areas than the detector field- 
of-view. When PACS is the only instrument powered, the 
prime mode, there are three possible scan speeds: 10, 20 
and 60 arcsec/s. The PACS camera can also work in scan 
mode in conjunction with SPIRE, the parallel mode, in 
which case observations are driven by SPIRE requirements 
and the scan speed is either 30 or 60 arcsec/s. Most of the 
PACS data are therefore obtained in scan mode, either in 
prime or parallel mode, and in any case, the bolometer ar- 
rays are read at 40 Hz. 

The data are compressed using a lossless entropic com- 
pression scheme which yields up to a factor of 4, requiring 
another factor of 4 (8 in parallel mode) to be achieved with 
lossy steps. For this purpose, 4 (or 8) consecutive frames 
are averaged before lossless compression. Additionally, a 
bit-rounding step is performed. It consists in rounding the 
averaging that is performed on board so that the number 
of bits required to encode the data is reduced by 1 or 2 
bits. This rounding is randomly up and down so as to not 
bias the results. In prime mode, one bit is removed while in 
parallel mode two bits are removed. We will not consider 
the bit-rounding operation in this article. In parallel mode 
an averaging of 8 frames is required because the satellite 
bandwidth is shared with SPIRE. In fast parallel mode, 
this averaging results in a blurring of the frames and thus 
a loss of resolution in the sky maps which is highly un- 
desirable. Indeed, in this mode the scan speed can be 60 



arcsec/s, or 18.75 pixels/s. Frames are taken at 40 Hz, thus 
the displacement between two frames is approximately 0.5 
pixel. In parallel mode 8 frames are averaged with a total 
motion of around 4 pixels. This is very significant compared 
to the point-spread function (PSF) in the blue filter which 
has a full-width at half maximum (FWHM) of 1.6 pixels. 

This blurring is a very strong incentive to investigate 
other compression schemes. Presently, it can be avoided 
only by configuring the so-called transparent compression 
mode, where only one-eighth of the frame area is trans- 
mitted, but at full temporal resolution. With this mode we 
can experiment with all the stages of the acquisition while 
incorporating all features and artifacts of the actual instru- 
ment. 

Indeed, care must be taken when using a new acquisition 
or compression mode that it will not decrease our ability to 
separate the signal of interest from instrumental artifacts. 
PACS imaging data have some properties that have to be 
taken into consideration. The most salient one is a large 
signal offset that varies for each pixel. The dispersion of 
the offset among the detectors occupies half of the range of 
values allowed by the 16 bits digitisation, whereas the typ- 
ical signal range is around 15 bits with a noise entropy of 
4-6 bits. This noise measurement is not a theoretical value, 
but the entropy of decorrelated measured data. It has been 
determined in two ways : computing the Shannon entropy 
of the difference signal minus 0.5 bit (Differentiation in- 
creases the sigma by a factor y(2) or half a bit in entropy, 
which needs to be subtracted to get the original value) and 
computing the Shannon entropy of the high-frequency co- 
efficients of a wavelet (for example Haar) transform. Both 
methods give almost the same value. This is essentially the 
readout noise, so it does not take into account the pink 
component of the noise which can be seen at longer time 
scales. Hence, the quantization noise is not negligible with 
respect to the signal and adds to the other sources of noise. 
As can be seen in Figure [3^, the signal of interest is not 
visible when the offset is not removed and we cannot sub- 
tract it on board. If the offset is removed (Figure |3b), it is 
still hard to detect physical structure because of the noise. 
Some glitches can be identified however that could not be 
seen in Figure^. It corresponds to the high-valued isolated 
pixels. 

Furthermore, there is a drift of this offset (Figure [IJ 
which can be seen as pink noise (1// noise). More pre- 
cisely, it is a l// 1 / 2 noise as its power spectrum decreases 
with the inverse of the square root of the frequency. One 
of the main issues in PACS data processing is the separa- 
tion of the signal from this drift. Its amplitude is of the 
order of magnitude of the signal of interest. In addition, 
this pink noise is affecting all the temporal frequencies in 
the PACS bandwidth. In other words, the transition be- 
tween pink noise and white noise is happening at higher 
frequencies than the sampling frequency of PACS. 

Once the offset is removed, other features become visi- 
ble, as can be seen in Figure[TJ For instance, the bad behav- 
ior of a whole line of pixels results in frequent discontinuities 
of the intensity as a function of time (pixels 176 to 192 in 
the figure). A cosmic ray hit in the electronics of the last 
line around the time index 45000 results in an additional 
drift of these pixels. Other cosmic rays can hit individual 
detectors and be the cause of wrong intensity values (out- 
liers). However, those cosmic ray hits are affecting less than 
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Fig. 1. Data set of the PACS imaging camera in the blue 
band in the transparent mode. This set consists of two 
cross-scans. The offset has been removed. All pixels have 
been reordered in a vector shape and displayed as a func- 
tion of time (so that each pixel is shown as a horizontal 
line). Horizontal structures are the drift of the offset. The 
discontinuity in the offsets at the center of the figure comes 
from the temporal gap between the two cross-scans. An un- 
stable line (pixels 176 to 192) results in frequent intensity 
discontinuities along both scans. At time index 45000 the 
additional drift is due to a cosmic ray hit in the multiplexing 
electronics affecting a whole line. Finally, the faint vertical 
structures are due to the bright galaxy nucleus passing in 
front of the detectors. 



1 percent of the data due to the spatial structure of the de- 
tector elements (they are hollow). Outliers due to cosmic 
rays are usually filtered out by comparing the set of data 
values corresponding to one map pixel and masking values 
beyond a threshold (see section IST21 for details). 

The faint vertical structures in Figure [T] are in fact the 
nucleus of the galaxy which passes in front of the detectors 
multiple times. Only the nucleus is bright enough to be 
visible in the offset removed data. Projection of the data 
onto the sphere of the sky as well as filtering of the l// 1 ' 2 
noise are required in order to see the fainter parts. After 
the projection, the contributions of each pixel at each time 
index add up in order to greatly increase the visibility of 
the fainter objects. 



2.2. Instrument model 

Our optimal reconstruction scheme requires an understand- 
ing of the acquisition process, done here with an instrument 
model. 

For the rest of this article, the data sets and the sky 
maps are vectorized (i.e. re- indexed). Vectors are repre- 
sented as bold lower-case letters. Linear operators are rep- 
resented as matrices and written in bold upper-case letters. 

The PACS data are a set of samples of the bolometers 
taken at different times and thus at different positions in 
the sky (in scan mode). Each sample is proportional to 
the sky surface brightness and to the area of the detector 
projected on the sky sphere (see Figure [5]). Thus, PACS 
data and the flux of the sky are linearly related and we can 
write equation (fTJ) where y stands for the data, x denotes 
the sky pixels values and P is the projector. The noise is 



detect Dr j 



pixel i 



Fig. 2. Sketch of the intersection of the sky map pixel i 
and detector j. The matrix element of the projection 
matrix P is equal to the intersecting area of the pixel i 
with the detector j. The sky map pixels are square but the 
detectors are quadrilaterals because of the distortion of the 
optics. The distortion is exaggerated in this sketch. 



considered additive and is noted n. 



y = Px + n 



(1) 



We take into account the distortion of the field of view 
(FOV) by the optics of the instrument in the projector. 
The gains of the bolometers can be taken into account too. 
In our formalism, we would simply multiply _P by a diag- 
onal matrix with the gain map on the diagonal (assuming 
that the gain does not vary with time, the gain map would 
simply be the flat-field repeated ny times, where ny is the 
number of frames). The compression will be taken into ac- 
count separately. 

In this paper, we used an i mplementation of the PACS 
instrument model provided by IChania 1 (l2010h m prepara- 
tion. 



3. Map-making 

The map-making stage is the inverse problem associated 
with equation ([T]), that is the estimation of x knowing y. 
It can be computed using classical estimation methods. We 
now review some of the methods that have been imple- 
mented for the map-making stage. 



3.1. PACS pipeline 

The Herschel Interactive Processing Environment (HIPEj3 
offers back-projection of the data onto the sky map for the 
map-making pipeline step. In order to guarantee the conser- 
vation of the flux, the map must be divided by the coverage 
map (that is the number of times a pixel of the sky map 
has been seen by the detectors). The sky map estimation 
using the pipeline is thus described by equation 



P T 



^pipeline 



y 



p t i 



(2) 



Estimated sky maps are noted with a •. - T is the trans- 
pose of a matrix, 1 is a vector of ones. The division of two 
vectors here is the term by term division of their coeffi- 
cients. 



1 "HIPE" is a joint development by the Herschel 
Science Ground Segment Consortium, consist- 
ing of ESA, the NASA Herschel Science Center, 
and the HIFI, PACS and SPIRE consortia. (See 
http://herschel.esac.esa.int/DpHipeContributors.shtml). 
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filter 


70 /im 


100 /im 


160 /xm 


wavelength range (/im) 


60-85 


85-130 


130-210 


resolution (A/ A A) 


~ 3 


~ 2 


pixel size (arcsec) 


3.2 


6.4 


FOV (arcmin) 


3.5 x 1.75 


FWHM 


5.2 


7.7 


12 



Table 1. PACS photometer overall characteris- 
tics/performances (taken from the PACS Observer's 
Manual). The resolution is the spectral resolution of the 
bandwidth. 



This method is very fast and straightforward to imple- 
ment. But it does not provide a very accurate estimate of 
the sky. To illustrate this, one could compare this estimate 
to the least-squares estimate of the sky map. The least- 
squares estimate minimizes the criterion ||y — Pa;|| 2 and 
can be written in closed form as in equation ©. The least- 
squares solution can be seen as the maximum likelihood 
solution under the assumption of independent and identi- 
cally distributed Gaussian noise. 



(P T P) 



y 



(3) 



The pipeline solution can now be interpreted as a very 
crude approximation of the least-squares solution by re- 
placing the inversion of P T P by the inversion of the cover- 
age map. This approximation ignores the correlations intro- 
duced by the projection matrix. In a more physical interpre- 
tation, one can say that the pipeline solution does not take 
into account the possibility to make use of super-resolution 
from the PACS data. Reducing the pixel size in the pipeline 
sky map will not help on this matter. The projection matrix 
inversion is required. In fact, the PSF is not always sampled 
at the Nyquist- Shannon frequency and it is thus possible to 
obtain a better resolution in the sky map than in the raw 
data. As can be seen in Table [1] this is mainly beneficial for 
the blue filter (70 fim) in which aliasing occurs the most. 

3.2. MADmap 

There is an o ther approach called MADmap 
(jCantalupo et al.l 120 10T) which tries to better estimate 
the sky map. It makes use of a preconditioned conjugate 
gradient in order to efficiently estimate the least-squares 
solution. Furthermore, MADmap adds a more accurate 
model of the pink noise. The noise is modeled as a 
multivariate normal distribution with a covariance matrix 
D diagonal in Fourier space (equation (j3])). Noting F the 
matrix of the Fourier transform along the time axis, 
its conjugate transpose (and also its inverse), diag the 
operator which transforms vectors into diagonal matrices 
and ~ the Fourier transform of a vector or a matrix, we 
have: 

n = JV(0, D), D = F^DF = diag(d)F (4) 

This allows to store the full covariance matrix in an array 
of the size of the data, by storing the diagonal elements of 
the Fourier transform of _D, which we note here d. 



The map estimation in this case can be rewritten as a 
simple least-squares estimation following equation (|5]l. 

y <- D^y 
P' <- D^P 

& MADmap = {P T DP) 1 P T D^y 

P> T y> (5) 



(P' T P')~ J ■•' 



£>2 is such that D^D? = D. 

The difficulty resides in finding a proper estimation for 
the noise covariance matrix. One has to assume that the 
noise covariance matrix is time invariant at very long time 
scales in order to have enough data samples for its estima- 
tion. In addition, one has to assume that there is only noise 
in the data taken to estimate the noise covariance matrix. 

For Herschel/PACS data, we have data sets close 
enough to pure noise data so that estimating the noise 
statistics is not an issue. In principle, the noise covariance 
matrix should be of the size of the data set since the 1// 
noise extends to the largest scales. In practice we can per- 
form a median filtering at very large scales and use the noise 
covariance matrix estimation for intermediate scales. The 
median filter window can be made large enough to avoid 
filtering any scale corresponding to the physical sources of 
interest. This is easier for unresolved point sources but even 
for extended emission the scales of interest do not exceed 
the size of the map. 

Furthermore, the current MADmap implementation 
does not make use of an accurate projection matrix but as- 
sumes a one-to-one pixel association between sky map pixel 
and data pixel. This limits the capacity of MADmap to take 
advantage of the super-resolution possibilities of PACS data 
since the projection matrix is not accurate. Another limita- 
tion of MADmap is due to the temporal-only Fourier trans- 
form. Because of it, the covariance matrix can only model 
temporal correlation, neglecting detector to detector corre- 
lations, which have to be removed by some pre-processing. 
However, most are gone after median filtering. 

4. Compressed-Sensing 

4.1. Theory 

It is now well known that natural images and astro- 
nomical images can be sparsely represented in wavelet 
bases. This finding has been successfully applied to a 
number of problems and to co mpression in particular 
(•JP EG2000 (ISkodraset ail l200lh or Dirac video compres- 
sion (jBorer fc DaviesL l2005h l . But, until recently, it had not 
been realized that it is possible to make use of this knowl- 
edge to derive more efficient data acquisition schemes. This 
is precisely the idea behind the compressed-sensing (CS) 
theory. Compressed-sensing states that if a signal is sparse 
in a given basis, it can be recovered using fewer samples 
than would be required by the Shannon- Nyquist theorem, 
hence the term compressed-sensing. See ICandes fe Wakir] 
(2008 ) for a tutor ial in troducing comp ressed-sensing theory 
or ICandeal (|2006h an d IDonohoi (120061) fo r seminal articles. 
You can also look at IStarck et al.l (|201fH) which have a lot 
of example applications in astrophysics. 

More formally, let us assume that our signal of inter- 
est x is sparse in the basis B and we are taking samples 
y through the measuring matrix A (i.e. y = Ax). In this 
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context sparse means that a = Bx have few non-zero el- 
ements, a is said to be approximately sparse if it has few 
"significant" elements (far from zero) . The theory proposes 
two categories of measurement matrices that allow precise 
estimation of x : incoherent matrices and random matrices. 
Random measurements are likely to give incoherent mea- 
surements. Incoherence is the fundamental requirement for 
A in the compressed-sensing framework. Quantitatively, it 
means that the mutual coherence p, = maxj t k \\ m Jbk\\ is 
low where rnj and are the columns of A and B respec- 
tively. Under the assumptions of exact sparsity and inco- 
herence, it can be shown that x can be estimated exactly 
using few measurements. Precisely, if m is the number of 
measurements, s is the number of non-zero elements and n 
the number of unknow ns, we need m > C ■ p 2 • s ■ log(n), 
where C is a constant (jCandes &: Wakinl . l2008h . Under the 
approximate sparsi ty hypothesis, good performances can be 
demonstrated too (St oinic et all I2008T) . 

In order to get the benefits of CS it is assumed that it 
is possible to solve the inverse problem under the assump- 
tion of sparsity (i.e. to find a sparse solution to the inverse 
problem). In other word, we need to find the solution of 
problems of the type of equation ([6]) . 

xcs — argmin ||-Bx||o subject to \\y — Ax\\ 2 < e (6) 

X 

where |j • |jo is the Lo norm which is the number of non- 
zero elements of the vector and e is a small value linked 
to the variance of the signal. This estimation problem is of 
combinatorial complexity (all solutions need to be tested) 
and thus is intractable for our applications. 

Since equation ([S]) is of combinatori al complexity, it 
is oft en replaced by equation flT]) (see iBruckstein et al.l 
(2009)), where the Lo norm has been replaced by the Li 
norm ( ||||i is the Li norm), or the sum of the absolute 
values of the coefficients. 

&cs — argmin ||-Bx||i subject to \\y — Ax\\ 2 < e (7) 

X 

We can choose to write the estimation problem as in 
equation ([5]). It can be seen as a reformulation of equation 
(0 using Lagrange multipliers. Statistically, it can also be 
seen as the co- log-posterior (the opposite of the logarithm of 
the posterior distribution) of an inverse problem assuming 
a normal likelihood and a Laplace prior. 

xcs = argmin \\y — Ax\\ 2 + A||-Bsc||i (8) 

X 

Note that this modeling of the problem ignores some 
artifacts of PACS data such as glitches and pink noise. We 
will see how we can handle it in preprocessing. 

4.2. PACS implementation of compressed-sensing 

We now describe how we implement the CS theory to PACS 
data in practice. Let us stress that we cannot modify the 
way Herschel observes and that we are severely limited by 
computer resources on board. We choose to apply a linear 
compression matrix C to each frame. C is the compressed- 
sensing part of the acquisition model. The new inverse prob- 
lem can now be described as in equation ([5]). 

z = CPx + n (9) 




(b) 



m 1 

■ ■ 




Fig. 3. (a) Frame of the PACS imaging camera in the blue 
band. The darker and brighter isolated pixels are bad pix- 
els. There is one complete bad line (at the bottom of the 
last matrix, ordered from the left to the right and from 
the bottom to the top) . The observable structure is mainly 
the offset. Discontinuities in the offset correspond to the 
borders of the matrices of bolometers. Note that there are 
physical gaps between matrices as well as rotation and dis- 
tortion. The matrix used in the transparent mode is the 
sixth one. (b) The same frame with the offset removed. The 
brighter isolated pixels are glitches, (c) The same frame 
after Hadamard transform. The offsets have not been re- 
moved. 



Due to severe on-board computational constraints we 
cannot make use of random measurement matrices nor 
wavelet transforms. We choose the Hadamard transform 
which can be implemented in linear c omplexity us i ng th e 
fast Hadamard transform algorithm (|Pratt et all [l969). 
The compression is then made using pseudo-random dec- 
imation of the Hadamard transform of the frame. The 
Hadamard transform has been chosen since it mixes uni- 
formly all the pixels. Thus, in conjunction with the deci- 
mation, it is a good approximation of a random matrix. To 
illustrate this, one can look at the Figure [3] which displays 
the result of the Hadamard transform of a single frame of 
PACS (without removing the offset). 
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Note that our method is not CS in a strict meaning. 
Because we have more data than unknowns, we cannot tell 
from the CS theory which compression matrix is optimal. 
But we think CS friendly compression matrices are worth 
being investigated as there is no other theory that point 
to other optimal compression matrices under sparsity as- 
sumptions. We plan to test a wider range of compression 
matrices in future works. 

We also point out that the standard compression scheme 
of PACS scans (averaging) can also be modeled as in equa- 
tion ([9]) replacing C by an averaging matrix. The exact 
same inversion scheme can be applied to both the CS inver- 
sion problem and the standard inversion problem allowing 
a clear comparison by separating effects due to the acqui- 
sition matrix from those due to the sparsity constraint. 

Equation (|10[) shows the averaging compression that is 
usually performed on PACS data. In this equation I is the 
identity matrix of size the number of detectors. 



norm. The Huber norm is defined according to equation 

f U{x) = x 2 ,Vxe [-5,5] . . 

\n{x) = 2\5x\-S 2 ,Vxg[-M [ ' 

The Huber norm is a way to approximate a Li norm (|| • || i) 
with a function everywhere differentiable, which is manda- 
tory in order to use CG methods. For this purpose, the 
Huber parameter 5 can be chosen very small. 
Finally, we minimize equation (1141) . 

xcsh = argmin \\z — CPx\\ 2 

X 

+X w n{Wx) + \ x H(D x x) + \ y H{D y x) (14) 

Th is optimization scheme is very similar to iLustig et al.l 
(2007j) in the idea of applying CG iterations and replacing 
the Li norm by a close function everywhere differentiable. 



Cav — 



( I ... I 



V 



(10) 



/ 



Equation (fTTj) shows the compressed-sensing compres- 
sion that we implemented. H is the Hadamard transform of 
one frame, and Af, is a decimation matrix. A decimation 
matrix is a rectangular matrix with only ones and zeros 
values and exactly one non-zero element per row. It selects 
the elements of the Hadamard transform to be transmitted. 
For example, in the parallel mode, only one-eighth of the 
values are transmitted. The decimation matrices vary from 
one frame to the next to minimize the redundancy of the 
transmitted information. F is the compression factor. 



5. Preprocessing 

It is important to note that despite the significant alter- 
ation of the aspect of the data after CS compression (or 
any kind of linear compression), it is still possible to ap- 
ply the same preprocessing steps that are performed in the 
standard PACS pipeline. Let us demonstrate this in details. 

The following is usually applied to the data before the 
inversion step (back-projection or MADmap) : 

— Removal of the offset by removing the temporal mean 
along each timeline 

— Deglitching 

— Pink noise filtering using median filtering (removes only 
the slowly drifting part in the case of MADmap) 



M F H 



\ M i H J 

(11) 

Note that all the matrices in this paper are not im- 
plemented as matrices but rather as linear functions as it 
would not be feasible to store such large matrices and pro- 
hibitive to perform matrix multiplications. 

We define sparse priors in both wavelet space and fi- 
nite difference. The criterion can then be rewritten as in 
equation (|12p . where W is a wavelet transform and D x 
and D y are finite difference operators along both axes of 
the sky map. The finite difference operators simply com- 
pute the difference between two adjacent pixels along one 
axis or the other. Thus, as a prior they favor smooth maps. 
Since we minimize a L\ norm, it allows for some bigger 
differences, hence the map can be piecewise smooth. 

xcsh = argmin \\z — CPx\\ 2 

X 

+ \ w \\Wx\\i + X x \\D x x\\i + X v \\D v x\\i (12) 

We estimate xqsh (CSH meaning Compressed-Sensing 
for Herschel) using a conjugate gradient (CG) with Huber 



5.1. Offset and pink noise removal 

The removal of the offset is simple. Since the compression 
is linear, and applied independently on each frame, one can 
remove the offset by removing the temporal mean as in 
the averaging mode. The same applies for the pink noise 
removal. 

In order to mitigate the effect of pink noise in the es- 
timated maps, a filtering is performed during the prepro- 
cessing. Following the method of the pipeline, we use an 
high-pass median filtering. This can have a side effect of re- 
moving signal of interest at larger spatial scales in the map. 
This is of little importance in fields containing mostly point 
sources but can be dramatic in fields with structures at ev- 
ery scale such as fields taken in the galactic plane. One of 
the interests of MADmap is to be able to remove the pink 
noise artifacts in the maps without affecting the large scale 
structures. But even with MADmap-like methods, one still 
have to remove the very low frequency component of the 
pink noise not taken into account by the estimated noise 
covariance matrix. That is why it is important to ensure 
that one can perform median filtering in a CS framework. 

We perform high-pass median filtering on the CS data 
exactly in the same way as it is done with the averaged 
data. Thanks to the linearity of the CS compression model, 
the pink noise in the raw data remains pink noise in the CS 
data. Thus, the median filtering affects the noise in the CS 
data the same way it does in the averaged data. 
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The only important parameter in the median filtering is 
the length of the sliding window used to compute the me- 
dian in the neighborhood of each point. If the window is too 
large, there will be too much pink noise remaining in the 
filtered data. If it is too small, large scale structures in the 
map will start to get affected. As a compromise, we choose 
the window to be of the size of a scan leg, which is approxi- 
mately the largest scale in the map. In the NGC6946 maps 
this corresponds very roughly to a window of 1000 samples 
at a sample rate of 40 Hz at a speed of 60 arcsec/s. For the 
compressed data, we divide this length by the compression 
factor. 



5.2. Deglitching 

The deglitching is more complicated since it makes use of 
spatial information. In order to apply deglitching in the CS 
case, we first decompress the data without projecting onto 
the sky map (equation ([15])) ■ 



y = (C T C)- l C T z 



(15) 



We can then perform deglitching on y as we would 
have done on averaged data. We perform deglitching on 
y using a method called second level deglitching. It con- 
sists of regrouping all data samples that correspond to a 
given pixel map and masking out the ones that are be- 
yond a given threshold using the median absolute deviation 
(MAD) method. 

Note that any operation that is performed on standard 
data could be applied on CS data using equation (fT5"|) . This 
is a very important advantage of using a linear compression 
method. Other compression methods, such as JPEG for 
instance, do not satisfy this property, since a thresholding is 
performed. It is also the linearity of the compression which 
allows to jointly perform the decompression and the map- 
making steps of the inversion and thus to obtain better 
maps. 

When C is an averaging matrix (the standard compres- 
sion), the samples that are found to be glitches are simply 
masked. In our case, we define a masking matrix M which 
is simply an identity matrix with zeros on diagonal elements 
corresponding to where values are masked. We can finally 
perform the inversion replacing CP by CMP in equation 
(TT4"|) . This way, the inversion is done without altering the 
data set but entries corresponding to glitches are not taken 
into account in the inversion. 

As can be seen in the maps in Figure HI this is an ef- 
fective method since no glitch remains visible. To ensure 
optimal detectability of the glitches it is best to perform 
pink noise removal beforehand. One can use a shorter win- 
dow for the median filtering to better remove the pink noise 
only for the deglitching step. Indeed, in this case, it does 
not matter if the map structures are affected since the fil- 
tered data are only used to compute a mask for the glitches. 
One can then apply the mask on the data filtered with a 
larger window to preserve large scale structures. 

5.3. Conclusion 

To summarize, we can perform the same processing as the 
pipeline in the CS mode. There is no obstacle remaining 
which would prevent us from applying CS to PACS data. 




0.00005 
0.00000 
-0.00005 



Fig. 5. Line cuts along the two sources displayed in the 
zoom of figure [4j Cuts are made by two-dimensional linear 
interpolation of the maps. The x axis is the pixel index of 
the interpolation. The y axis is the intensity in Jy. 



Let us see now the results we obtained with CS on real 
data. 



6. Results 

We present results of sky map estimation using PACS data 
observations in transparent mode. The different modes of 
compression are simulated. (PL) is the pipeline mode, i.e. a 
simple weighted backprojection ignoring that the data have 
been compressed (see equation [5]) . Compression is done by 
averaging. (PLI) is a full inversion of the pipeline model. It 
means it does not take into account the correct compression 
model. Data for (PLI) have been compressed by averaging. 
In other words (PLI) is the same as (PL) replacing the 
backprojection with a full inversion. (ACI) is the inversion 
of the model including the compression by averaging, (HCI) 
is the inversion with Hadamard compression and (NOC) is 
the inversion of the full data set (not compressed). All these 
methods but (PL) use the same sparse inversion algorithm 
described in section l4~2l 

The compression factor on the acquisition side is al- 
ways 8 in the following results except for the reference map 
(Figure 2^) for which no compression is applied. The data 
have been taken in fast scan mode (60 arcsec/s). Those are 
the conditions that give the most important loss of reso- 
lution in standard (averaging) mode since the blurring of 
each frame is the most important. The data set is made of 
two cross scans in the blue band of approximately 60000 
frames each. Each frame has 256 pixels (instead of 2048 
because of the transparent mode). The estimated map is 
made of 192 x 192 pixels of 3 arcsec each, resulting in a 10 
arcmin 2 map. The object observed is the galaxy NGC6946 
from the KINGFISH program (P.I. R. Kennicutt) but the 
observations are dedicated calibration observations. Results 
are presented in Figure 0J Cuts along a line going through 
the two sources in the zoom are presented in Figure [5] 

To illustrate the impact on the spatial resolution of 
the different reconstruction methods, we list in Table [2] 
the mean of the full width at half maximum (FWHM, in 
pixels) derived from Gaussian fits to four compact sources 
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Fig. 4. Map estimations of NGC 6946 using (a) the pipeline (PL) back-projection method , (b) the pipeline model 
inversion (PLI) (without taking into account the compression) , (c) averaging compression inversion ( ACI) , (d) Hadamard 
compression inversion (HCI) and (e) reference map without any compression (NOC). Maps are presented on a fourth 
root intensity scale. All maps have the same scale. 



Nicolas Barbey et at: Compressed-sensing and sparse map-making with Herschel/PACS 



9 



FWHM (pixels) 


(PL) 


(PLI) 


(ACI) 


(HCI) 


(NOC) 


mean of 4 sources 


4.30 


4.01 


2.75 


3.27 


3.15 



Table 2. FWHM of sources in pixels in Figure E] estimated 
using 2-dimensional Gaussian fits of 4 isolated sources ran- 
domly chosen from the map. 



Maps 


(PL) 


(PLI) 


(ACI) 


(HCI) 


(NOC) 


std (10~ b Jy) 


4.84 


14.3 


19.2 


20.3 


8.3 



Table 3. Standard deviation for each map in Figure 2] 
Those values are estimated from the corners of the maps 
(~ 50 x 50 pixels area at each corner for a total of ~ 10000 
pixels in each map). 



pa n 


in 


ra O 


dec (") 


52.5 


46 


308.71 


60.153 



Table 4. Parameters of the NGC 6946 galaxy. PA is the 
position angle, i is the inclination, ra is the right ascension 
and dec is the declination. 



Maps 


(PL) 


(PLI) 


(ACI) 


(HCI) 


(NOC) 


flux (Jy) 


0.65 


0.62 


0.64 


0.67 


0.63 



Table 5. Total flux of NGC 6946 in Jy as estimated from 
the maps of Figure |U 



selected on the field. Although these sources are all not- 
strictly point-like, comparison of the achieved mean FWHM 
is quite instructive. 

To compare quantitatively the global results, we made 
radial profiles of NGC 6946 using maps from each method. 
Radial profiles were obtained using galaxy parameters from 
the NASA Extragalactic Database (NED) and summed up 
in Table SJ Surface brightness is averaged over pixels con- 
tained in concentric ellipses. Background values are esti- 
mated using corners of the maps where no source can be 
seen. The standard deviation of the pixels in the back- 
ground areas give an estimate of the standard deviation 
of the noise in the whole map. Those values are reported in 
Table [3] 

Results are presented in Figure [5J Figure [SJi shows 
radial profiles corresponding to each map from figure El 
Those profiles are supplemented with error bars estimated 
from the background measurements given in Table [3] We 
assumed normal noise meaning that the standard devia- 
tion of the surface brightness at each radius is given by 
^profile = a / bkg , where N(r) is the number of pixels in at 



radius r and <Jbkg is the standard deviation of the pixels in 
the background. 

Figure m> shows the ratio of those radial profiles over 
the radial profile of the reference map without compression. 
Note that the reference map is not guaranteed to be the 
ground truth. In particular, it is affected with the potential 
issues of the median filtering (used to remove pink noise) 
as much as the other maps. However, we assume it is closer 
to the ground truth since it does not suffer from a loss of 
information due to compression. 

We deduced the total flux of NGC6946 from the maps 
in Figure 2] and we display the result in Table [5] 



7. Discussion 

Comparing Figures 2£i and0j3, there is a small gain in reso- 
lution. The FWHM estimate in table[2]illustrates this small 
gain since the FWHM drops from 4.3 pixels to 4.0 pixels. 
This gain comes from the inversion of the projection matrix. 
Physically, it means that we used the pointing information 
to get a super-resolved map. Note that this results holds 
only for the blue band which is not sampled at the Nyquist 
frequency. 

Looking at Figure 0] maps ACI and HCI are better re- 
solved than maps PL and PLI. This is supported by the 
quantitative results presented in Table [2] Figure [5] shows 
that this gain in resolution allows close sources to be better 
separated. From Table [5J the gain in resolution is approx- 
imately 1 pixel, or 3 arcseconds. This gain is due to the 
inversion of linear models which take into account the com- 
pression step and to the inclusion of sparsity priors in the 
inversion algorithm. This is common to both ACI and HCI 
and not intrinsic to CS. 

Comparing ACI and HCI in Figure SJ it appears that 
the structure of the remaining noise is different. HCI does 
not exhibit linear patterns in the noise, following the scan 
direction. This is due to the mixing of the pink noise compo- 
nent of each detector done by the Hadamard compression. 
Furthermore, Table [5] shows that ACI has a mean FWHM 
below NOC. This is unexpected since NOC has more data 
than ACI. But this can be explained by the median filtering 
performed to filter the pink noise. Median filtering can have 
a small effect on compact sources even when large sliding 
windows are used. On the opposite, HCI seems more robust 
to this median filtering since the mean FWHM in the HCI 
map is just slightly above the NOC mean FWHM and not 
below it. 

Of course, these lossy compression schemes still imply a 
loss of information but it mostly translates into an increase 
of the uncertainty of the map estimation (i.e. the noise). 
This is true for both compression matrices. 

Note, however, that comparing results between CS and 
averaging compression matrices does not allow to conclude 
on the absolute performance of compressed-sensing. To con- 
clude on this point, a study is needed to find the best CS 
compression matrix. In this paper we only try to highlight 
the applicability of CS methods to real data from a space 
observatory. 

The amount of residual noise can seem surprising if one 
does not remember that the noise in PACS data is mostly 
a pink noise, and thus is not well taken into account in our 
inversion model. We need to make use of the noise covari- 
ance matrix in our inversion scheme. In order to do so, we 
can update equation (|14[) by taking into account a noise co- 
variance matrix. This can be done using methods inspired 
by MADmap. We will investigate this possibility in a forth- 
coming article. However, since the resolution is increased in 
our results, the noise increase cannot be directly translated 
into a sensitivity decrease. 

In figure (JB^i), we can see that the radial profiles ob- 
tained using the maps displayed in figure (0]) are very close 
to each other. It gives confidence in the capability of our 
methods to preserve the flux in the maps at larger scales. 
In particular, this shows that the high-pass filtering is not 
affecting the profiles differently for CS than for averaging 
compression. It validates the way we perform this prepro- 
cessing step in the CS pipeline. 
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Fig. 6. (a) Radial profiles of NGC 6946 in surface brightness (Jy) as a function of distance from the center of the galaxy 
(arcsec). Error bars are inferred from the background measurements reported in Tableland from the number of pixels 
at in the ellipses at each radius, (b) Ratio of radial profiles of the different maps over the radial profile of the reference 
map obtained without compression. Radial profiles are obtained from the maps in Figure 01 Acronyms refers to Figure 
® 



Figure (JSJd) shows the relative discrepancies between the 
profiles. They are higher further from the center since the 
surface brightness are very low, but they are consistent with 
the uncertainties. Again the CS compression scheme and 
the averaging compression scheme are very similar. 

Profiles from compressed maps seem to be systemat- 
ically slightly over-estimated between and 150 arcsec- 
onds from the center, especially for (HCI) and (PL) maps. 
However, as shown in Table [S] the total flux of the galaxy 
computed for each map are within 7 % of the flux computed 
from the reference map. This result is in full accordance 
with the photometric accuracy quoted by the PACS team 
(jPoglitsch et all 120101) . 

We presented results for a resolved galaxy only but 
our method could benefit other objects. For example, deep 
survey observations are limited by the confusion limit at 
which the presence of numerous unresolved sources ap- 
pear as noise. We plan to see if the confusion limit could 
be improved with our approach in future work. The gain 
in resolution could also help with regard to the source 
blending issue and could help improve the precision of 
catalogs of isolated unresolved sour ces (as estimated by 
SExtractor (jBertin fc Arnoutsl . Il996[ ). for instance). This 
could be tested o n simulated f i eld of sources with known 
positions. See also lPence et al.l (|2010D for a study on com- 
pression method using this idea. 

8. Conclusions 

In this paper we showed that methods inspired by the 
compressed-sensing theory can successfully be applied to 



real Herschel/PACS data, taking into account all the in- 
strumental effects. For this purpose, special PACS data 
was acquired in the "transparent mode" in which no com- 
pression is applied. We then performed compressed-sensing 
compression scheme as well as the standard averaging com- 
pression scheme on the very same data which allows for an 
unbiased comparison. The data arc affected by various ar- 
tifacts such as pink noise and glitches. Handling of these 
artifacts is well known on standard averaged data but we 
show that the same kind of techniques can successfully be 
applied on compressed-sensing data. 

Furthermore, we showed that it is possible to signifi- 
cantly improve the resolution in the sky maps compared to 
what can be obtained using the pipeline. This is done by 
taking into account the compression matrix in the acqui- 
sition model and by performing a proper inversion of the 
corresponding linear problem under sparsity constraints. 

For now the resulting sky maps in CS and averaging 
compression modes are of the same quality. However, it is 
important to note that we may not be applying the best 
CS method. In particular, the Hadamard transform is only 
one compression matrix among others which satisfies the 
requirements for a CS application. Thus, this article does 
not in any way conclude on the performance of CS over 
standard compression. We can only conclude that this par- 
ticular approach inspired by the CS framework and this 
particular Hadamard transform give as good results as the 
averaging compression matrix. But choosing another com- 
pression matrix could further improve the CS results over 
averaging. 
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We plan to more systematically analyse other choices 
for the compression matrix in a further article. Other im- 
provements can be expected by performing deconvolution 
of the instrument PSF during the map-making stage and 
including the noise covariance matrix as in MADmap. 
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